No rationale for 1 variable per 10 events criterion for binary logistic regression analysis

نویسندگان

  • Maarten van Smeden
  • Joris A H de Groot
  • Karel G M Moons
  • Gary S Collins
  • Douglas G Altman
  • Marinus J C Eijkemans
  • Johannes B Reitsma
چکیده

BACKGROUND Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. METHODS The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. RESULTS The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. CONCLUSIONS The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Binary Regression With a Misclassified Response Variable in Diabetes Data

Objectives: The categorical data analysis is very important in statistics and medical sciences. When the binary response variable is misclassified, the results of fitting the model will be biased in estimating adjusted odds ratios.  The present study aimed to use a method to detect and correct misclassification error in the response variable of Type 2 Diabetes Mellitus (T2DM), applying binary ...

متن کامل

A New Nonlinear Specification of Structural Breaks for Money Demand in Iran

In a structural time series regression model, binary variables have been used to quantify qualitative or categorical quantitative events such as politic and economic structural breaks, regions, age groups and etc. The use of the binary dummy variables is not reasonable because the effect of an event decreases (increases) gradually over time not at once. The simple and basic idea in this paper i...

متن کامل

به کارگیری مدل‌های رگرسیون لجستیک ترتیبی در مطالعات کیفیت زندگی

 Background & Objectives: Due to the increasing tendency to measure the quality of life in recent years and the extensive quality of life questionnaires, it is important to determine the appropriate method of analyzing data derived from these studies. The aim of the present study was to introduce ordinal logistic regression models as an appropriate method for analyzing the data of quality of li...

متن کامل

Logistic Regression Tree Analysis

This chapter describes a tree-structured extension and generalization of the logistic regression method for fitting models to a binary-valued response variable. The technique overcomes a significant disadvantage of logistic regression, which is interpretability of the model in the face of multicollinearity and Simpson’s paradox. Section 1 summarizes the statistical theory underlying the logisti...

متن کامل

Phase II logistic profile monitoring

In many industrial and non-industrial applications the quality of a process or product is characterized by a relationship between a response variable and one or more explanatory variables. This relationship is referred to as profile. In the past decade, profile monitoring has been extensively studied under the normal response variable, but it has paid a little attention to the profile with the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2016